Evidence of a putative CO2 delivery system to the chromatophore in the photosynthetic amoeba Paulinella

Abstract The photosynthetic amoeba, Paulinella provides a recent (ca. 120 Mya) example of primary plastid endosymbiosis. Given the extensive data demonstrating host lineage‐driven endosymbiont integration, we analysed nuclear genome and transcriptome data to investigate mechanisms that may have evolved in Paulinella micropora KR01 (hereinafter, KR01) to maintain photosynthetic function in the novel organelle, the chromatophore. The chromatophore is of α‐cyanobacterial provenance and has undergone massive gene loss due to Muller's ratchet, but still retains genes that encode the ancestral α‐carboxysome and the shell carbonic anhydrase, two critical components of the biophysical CO2 concentrating mechanism (CCM) in cyanobacteria. We identified KR01 nuclear genes potentially involved in the CCM that arose via duplication and divergence and are upregulated in response to high light and downregulated under elevated CO2. We speculate that these genes may comprise a novel CO2 delivery system (i.e., a biochemical CCM) to promote the turnover of the RuBisCO carboxylation reaction and counteract photorespiration. We posit that KR01 has an inefficient photorespiratory system that cannot fully recycle the C2 product of RuBisCO oxygenation back to the Calvin‐Benson cycle. Nonetheless, both these systems appear to be sufficient to allow Paulinella to persist in environments dominated by faster‐growing phototrophs.


INTRODUCTION
Photosynthetic Paulinella, a rhizarian amoeba and a recent example of primary plastid endosymbiosis (ca.120 Mya), represents a model system for studying the earlier stages of organellogenesis relative to the much longer evolutionary history (>1.6 Bya) of the Archaeplastida (Bhattacharya et al., 2023;Delaye et al., 2016;Gabr et al., 2020;Lhee et al., 2019;Nowack et al., 2008;Stephens et al., 2021;Van Etten et al., 2023).The photosynthetic organelle in Paulinella, referred to as the chromatophore, supplies its host with energy in the form of reduced carbon, allowing this previously heterotrophic lineage to transition to a fully photoautotrophic lifestyle (Kies, 1974;Kies & Kremer, 1979).However, Paulinella is extremely slow growing, with a doubling time of 5-7 days in replete culture medium (Zhang et al., 2017).In addition, these cells can only grow under low light (10-20 μmol photons m À2 s À1 ) (Zhang et al., 2017), whereas freeliving Synechococcus spp.(putative chromatophore ancestor, and member of the α-cyanobacterial clade) have a maximum doubling rate of >1.4 times/day and tolerate a much broader range of light intensities (30-2000 μmol photons m À2 s À1 ) (Kana & Glibert, 1987).A previous study showed that all genes necessary for photosynthesis are present in either (or both) the chromatophore and nuclear genomes of Paulinella, suggesting that the loss of photosynthesis-related genes is unlikely to explain the slow growth phenotype (Gabr, Zournas, et al., 2022).
Factors potentially limiting the growth rate of Paulinella include poor metabolic connectivity (metabolite exchange) between the chromatophore and host that is exacerbated by exposure to high-intensity light.This may impair overall metabolism and cause elevated intracellular redox potentials and the formation of reactive oxygen species that result in damage to Paulinella cellular processes (Stephens et al., 2021;Zhang et al., 2017).In addition, the transport of host, cytoplasmically synthesized proteins into the chromatophore may be suboptimal.'Long' cytoplasmically-synthesized proteins (i.e., >268 amino acids [aa] in length) in Paulinella that are targeted to the chromatophore contain a unique $200 aa N-terminal extension (known as a chromatophore transit peptide [crTP]) that can target proteins to the chloroplast in tobacco (Singer et al., 2017).Proteins harbouring this N-terminal extension are therefore most likely to be chromatophoretargeted.In contrast, 'short' chromatophore-targeted proteins (<90 aa in length; lacking a crTP) are transported across the chromatophore membrane via a separate mechanism that appears to involve the secretory pathway (Nowack & Grossman, 2012).Finally, analysis of enriched insoluble protein fractions derived from the chromatophore inner membrane does not provide evidence of canonical, nucleus-encoded, organelletargeted metabolite transporters, as is the case for Archaeplastida (Karkar et al., 2015).Rather, short chromatophore-targeted orphan proteins (related to octotricopeptide repeats) may modulate membrane permeability (Oberleitner et al., 2020), in which case, metabolite exchange between the host and endosymbiont may have low efficiency.
Given these observations, it is of interest to investigate mechanisms that may have arisen (and are likely still evolving) in Paulinella to support its obligate photosynthetic lifestyle and provide insights into its growth phenotype.These innovations will most likely occur in the host nuclear, rather than the chromatophore genome, which is undergoing Muller's ratchet, resulting in massive genome reduction (Stephens et al., 2021).Past work also shows that key chromatophore genes involved in photosynthesis do not show light regulation (Zhang et al., 2017).Therefore, there is a presumptive added burden on the host to balance photosynthesis and photorespiration in an organelle which encodes genes that may have unregulated expression (see below).
In this regard, a key constraint faced by photoautotrophs is the need to efficiently fix CO 2 via the Calvin-Benson cycle of the C 3 pathway (von Caemmerer & Evans, 2010;Yamori et al., 2014).The enzyme ribulose-1,5-bisphosphate carboxylase/oxygenase (RuBisCO) is central to this cycle and incorporates CO 2 into 3-phosphoglyceric acid (3-PGA) (Johnson, 2016), which is reduced to dihydroxyacetone phosphate/phosphoglyceraldehyde.These products can be converted into glucose, other sugars, and fixed carbon polymers that drive many downstream metabolic processes (e.g., amino acid and lipid biosynthesis), or are used for energy storage (starch, triacylglyceride).However, cyanobacterial RuBisCOs have very low affinities for CO 2 (K CO2 >200 μM; Moroney & Somanchi, 1999), and O 2 in the atmosphere competes with CO 2 for binding to the active site of this enzyme (Giordano et al., 2005).RuBisCO can catalyse the oxygenation of ribulose-1,-5-bisphosphate (oxygenase activity) converting it into 3-PGA and 2-phosphoglycolate, with the latter channelled into the photorespiratory C 2 cycle.Photorespiration leads to a loss of fixed carbon and energy, resulting in a $30% decrease in photosynthetic efficiency (Sage, 2004).This issue is most severe under high light (HL), elevated temperature, and desiccation conditions.In addition, the 2-phosphoglycolate synthesized due to RuBisCO oxygenase activity is toxic, inhibiting enzymes associated with photosynthetic metabolism, and is removed from the cell by an energyintensive multi-step enzymatic reaction (Sheen, 1999).
To counteract photorespiration, various mechanisms (known as the CO 2 concentrating mechanism [CCM]) have evolved independently in different lineages, including algae, to concentrate CO 2 in the subcellular compartment that houses RuBisCO (Brautigam & Gowik, 2016;Kupriyanova et al., 2023;Schluter & Weber, 2020;Wei et al., 2019).These are referred to as biophysical or biochemical CCMs based on how CO 2 is delivered to the site of RuBisCO.The former functions by transporting inorganic carbon from the exterior to the interior of the cell, concentrating bicarbonate (HCO 3 À ) in the compartment that houses RuBisCO and converting it to CO 2 (substrate of RuBisCO) via the highly efficient enzyme, carbonic anhydrase (CA; Tsuji et al., 2017).This mechanism increases the rate of photosynthetic CO 2 fixation and is essential to the biophysical CCM of cyanobacteria and microalgae (Badger & Price, 2003;Kupriyanova et al., 2023;Price et al., 1992;Wei et al., 2019).In contrast, biochemical CCMs, present in C 4 plants, temporarily fix CO 2 into a 4-carbon organic intermediate (e.g., malate).These intermediates are transported to the site of RuBisCO where they are decarboxylated to produce pyruvate, concentrated CO 2 , and the reductant [NAD(P)H] (Hennacy & Jonikas, 2020), which allows for efficient CO 2 fixation.
Paulinella inherited a biophysical CCM from its α-cyanobacterial ancestor that relied on an αcarboxysome (i.e., an icosahedral protein shell surrounding RuBisCO and a CA; Hennacy & Jonikas, 2020) to enhance the carboxylase activity of RuBisCO (Cai et al., 2015;Rae et al., 2013).Interestingly, the complete operon (referred to as cso) encoding components of the ancestral α-carboxysome (including form 1A RubisCO) of proteobacterial origin that was acquired via horizontal gene transfer (HGT) in the α-cyanobacterial ancestor is present in the Paulinella chromatophore genome (Marin et al., 2007).Proteomic analysis of the chromatophore demonstrates that α-carboxysome proteins are highly abundant in the organelle fraction (Singer et al., 2017).Key to the activity of this CCM is the production of CO 2 by a CA (csoSCA) which is encoded as part of the cso operon on the chromatophore genome.The only known nucleus-encoded component of the α-carboxysome that arose via endosymbiotic gene transfer (EGT) from the chromatophore genome (Nowack et al., 2011) is CsoS4A (peptide A).This is a pentameric carboxysome shell protein that supports the carboxysome structure by capping its 12 vertices and may function as a CO 2 leakage barrier (Cai et al., 2009).This protein in KR01 is 110 amino acids long, and although not identified in the proteomic analysis of the chromatophore (Singer et al., 2017), may be small enough in size to enter the organelle via the secretory pathway (Nowack & Grossman, 2012).Differential regulation of this nuclear gene provides a potential marker of carboxysome functionality in KR01 because deletion of csoS4A in the sulfur-oxidizing chemoautotroph Halothiobacillus neapolitanus reduces the catalytic activity of RubisCO in isolated carboxysomes, even though the RuBisCO does not appear to be compromised, potentially because of leakage of CO 2 from the carboxysome (Cai et al., 2009).
Given these findings, we searched the nuclear gene inventory of our model species Paulinella micropora strain KR01 (hereinafter, KR01) for clues as to whether this organism has a 'rudimentary' CCM and if so, how the CCM may function in this species.Does KR01 rely solely on an ancestral biophysical CCM encoded by the cso operon, or has a biochemical CCM (analogous to C 4 metabolism) evolved to improve CO 2 delivery to the chromatophore and carboxysome?We focused here on the latter possibility, and analysis of KEGG pathways in the predicted proteome of KR01-identified genes that may be involved in a biochemical CCM.We do not expect, however, that the putative KR01 biochemical CCM would strictly follow the canonical version present in algae and plants (e.g., Kroth et al., 2008;Ouyang et al., 2013;Reinfelder et al., 2000;Stutz et al., 2014;Xu et al., 2012) due to its earlier stage of evolution.To explore the predicted functions of target genes identified in our study, we generated RNA-seq data from cultures incubated under different light and CO 2 conditions for 6 h.Our work relied on polyA-selected cDNA libraries, therefore we have no data regarding the differential expression of chromatophore-encoded genes, such as in the cso operon.Our results suggest that a novel, host-derived CO 2 delivery system may have evolved in KR01, but some key details are still unknown or uncertain.

Cell culture and treatments
The P. micropora KR01 culture used in this study was obtained from Prof. Hwan Su Yoon (Sungkyunkwan University) whose group isolated the cells in 2009 from Mangae Jeosuji (Reservoir), Chungnam Province, South Korea.The KR01 cells were grown at 26 C in Fernbach flasks containing 0.3 L DY-V medium (Andersen et al., 2005), using a 16 h:8 h light/dark cycle (light on/off) at a light intensity of $15-20 μmol photons m À2 s À1 (control conditions; n = 4).Differential gene transcription was examined after incubating cells under various light, CO 2 , and pH conditions for 6 h.For high light treatment (HL; n = 4), cells were immediately shifted to an intensity of 150 μmol photons m À2 s À1 at the beginning of the light cycle.This light intensity was chosen because it is high enough to elicit a highlight-specific response, but not sufficient to cause cell death in the short-term.Note that the medium for control and HL treatments was buffered at pH 6.8 with 10 mM MES (2-(N-morpholino)ethanesulfonic acid) and pre-equilibrated with air (pCO 2 ≈10 À3.4 atm) prior to the addition of cells.For these treatments, the concentration of dissolved CO 2 (CO 2aq ) was 14 μM and of HCO 3 À was 38 μM (see calculations below).For the elevated CO 2 treatment (BC; n = 4), the medium was supplemented with sodium bicarbonate (NaHCO 3 ; final concentration of 5 mM), but was not pre-equilibrated with air, and cells were incubated under control (low) light conditions.Bicarbonate supplementation resulted in a nearly 8-fold increase in the concentration of dissolved CO 2 ([CO 2aq ] = 110 μM, see below).For the high light plus elevated CO 2 (NaHCO 3 ) treatment (HLBC; n = 4), cells were incubated under both high light and supplemental bicarbonate conditions as described above.Because the addition of bicarbonate caused the pH of the culture medium to increase from 6.8 to 8.0, we also conducted treatments (pH; n = 4) in which culture media was adjusted to pH 8.0 using 1 M NaOH and then pre-equilibrated with air prior to the addition of cells.In these treatments, the concentration of CO 2 was the same as in the control and HL treatments (14 μM) and the concentration of HCO 3 À was 600 μM.In all experiments, cells were incubated under the various treatment conditions for 6 h before RNA was harvested.Given the slow rate of KR01 photosynthesis, the carbonate system would not have changed significantly during these short-term incubations.For example, with a growth rate of 0.12 d À1 and a starting cell biomass of 200 μmol carbon/L, KR01 would remove less than 10 μM of dissolved inorganic carbon during a 6-h incubation.
Calculation of CO 2aq and HCO 3 À concentrations Control and HL treatment (pH = 6.8, open system): In Equations ( 3) and (4), Ci is the concentration of total dissolved inorganic carbon.

RNA extraction and RNA-seq
Cells were centrifuged at 4000 rpm for 10 min at 4 C.The resulting pellets were frozen in liquid nitrogen and stored at À80 C until RNA was extracted using a Qiagen RNeasy Plant Mini Kit following the manufacturer's instructions (Qiagen, Hilden, Germany).Total RNA was eluted with 50 μL RNAase-free water and quantified using a Qubit.Library construction and RNAsequencing on a HiSeq instrument 2500 (2 Â 150 bp reagents) was done by Genewiz.

Differential gene expression analysis
RNA-seq reads were trimmed using the CLC Genomics Workbench 8.5.1 (default settings; Qiagen, Hilden, Germany) and individually mapped to the reference genome using this program package.Only the raw "unique exon read" counts were used for downstream analyses (raw count data can be found in Supplemental Table 1).Differential expression analysis was done in RStudio 2021.9.2.382 (Team, 2022) using the DESeq2 package version 1.34.0 (Love et al., 2014).The DESeq2 normalized counts were used for sample-level QC using principal component analysis (PCA) and hierarchical clustering (Supplemental Figure 1).Based on the sample-level QC results, one sample from each treatment (control1, pH 1, HL3, and BC4) was removed because they represented outliers that differed significantly from the other replicate samples; these samples were excluded from downstream analysis.Significant DEGs between treatments were identified using DESeq2.Only genes with a Benjamini-Hochberg adjusted p-value <0.05 were considered for downstream analysis (Love et al., 2014).

Identification of biochemical CCMrelated genes
KAAS (KEGG Automatic Annotation Server: http:// www.genome.jp/kegg/kaas/)(Moriya et al., 2007) was used to assign KEGG Orthology (KO) numbers (K numbers) to the proteins predicted in the KR01 nuclear genome.The assigned KO numbers were used with the KEGG pathway mapper (Kanehisa & Sato, 2020) to generate metabolic maps from the predicted proteins in KR01.Initially, genes related to a biochemical CCM in KR01, identified using KEGG map00710 (Carbon fixation in photosynthetic organisms), were extracted and manually validated.

Functional annotation of KR01 proteins
Functional annotation of all expressed genes was done using BLASTp against the nr database (Lhee et al., 2021).GO terms were assigned to the expressed genes using PANNZER2 (Toronen et al., 2018).DEGs with an adjusted p-value <0.05 were tested for enrichment of GO terms using the topGO R package (Alexa & Rahnenführer, 2010), with all predicted genes in KR01 used as the background for the statistical analysis.The Fisher's Exact test statistic and the 'elimination' algorithm were applied to correct for the hierarchical structure of GO terms.For subcellular localization prediction, signal peptides in the protein sequences were predicted using MITOPROT II (Claros & Vincens, 1996), iPSORT (Bannai et al., 2002), and DeepLoc-1.0(Almagro Armenteros et al., 2017), Tar-getP -2.0 (Emanuelsson et al., 2007), TPpred2 (Savojardo et al., 2014), and CELLO v.2.5 (Yu et al., 2014).The previously generated chromatophoretargeting peptide (crTP) protein database (Lhee et al., 2021) was used to identify putative biochemical CCM-related proteins with a crTP.
Finally, there were two gene copies of alanine aminotransferase (ALT, EC 2.6.1.2),with the encoded proteins predicted to be localized to the mitochondrion (Table 1, Supplemental Table 2).Phylogenetic analysis shows that all the CCM-related genes examined in KR01, except MDH g8927.t1,are of eukaryotic origin (i.e., not the result of recent HGT provenance or EGT) (Supplemental Dataset 1).There are single to multiple copies of many of the KR01 CCM-related genes in the P. chromatophora proteome, that was predicted using a cDNA assembly (Nowack et al., 2016; see Supplemental Dataset 1).These data suggest that the evolutionary trajectories of KR01 and P. chromatophora have diverged significantly since their split (Bhattacharya et al., 2023).Given the lack of a highquality genome assembly for P. chromatophora and no experimental data regarding a putative CCM in this species, we have based the conclusions of the present study solely on KR01.A detailed analysis of P. chromatophora may turn up additional, novel insights into pathways of CCM evolution in the Paulinella lineage.
Model for the putative CO 2 delivery system in KR01 Based solely on the number, functional annotation, and putative subcellular localization of the predicted proteins, we developed a hypothetical model for CO 2 delivery to the chromatophore in KR01 that includes components of a biochemical CCM (Figure 1).This model posits that the PEPC carboxylation reaction is localized in the cytosol/mitochondria, suggesting that OAA is generated in the cytosol before it is transaminated to aspartate by AST or reduced to malate via MDH, with both of these reactions occurring in the cytoplasm or the chromatophore, although there are mitochondrion targeted versions of these enzymes in KR01.Decarboxylation of malate and OAA in the cytosol is possible because of cytosol-localized MAEB and PEPCK activities, which would convert these molecules to pyruvate and PEP (respectively) plus CO 2 .Transfer molecules, in the form of OAA, malate, or aspartate, may enter the chromatophore (mechanism currently unknown) where decarboxylation via the chromatophore targeted PEPCK and MAEB enzymes would release CO 2 in the compartment containing RuBisCO

ENVIRONMENTAL MICROBIOLOGY REPORTS
(Figure 1).Finally, the regeneration of PEP from pyruvate by PPDK likely occurs only in the cytosol.

Quantitative and qualitative transcriptome information
To study the expression patterns of genes related to the putative biochemical CCM proposed here, we generated RNA-seq data from cells exposed to one of four treatments for 6 h: (1) HL (HL; n = 4), used to induce or upregulate the CO 2 delivery system (150 μmol photons m À2 s À1 , pH = 6.8, [CO 2aq ] = 14 μM); (2) bicarbonate (HCO 3 À ) supplementation (BC; n = 4) under low light, which provided a nearly 8-fold increase in the extracellular concentration of CO 2 (pH = 8.0, [CO 2aq ] = 110 μM) relative to our standard batch culture and the HL treatment, and is therefore expected to suppress the synthesis of enzymes involved in a putative biochemical CCM; (3) combined HL and bicarbonate treatment (HLBC, pH = 8.0, [CO 2aq ] = 110 μM, n = 4) to determine if the putative CCM enzymes are downregulated in response to increased CO 2 regardless of light intensity; and (4) low CO 2 /pH 8.0 treatment ([CO 2aq ] = 14 μM, pH = 8.0, n = 4), because the addition of HCO 3 À to cultures resulted in a pH shift from 6.8 to 8.0; this treatment was included to control for any effects of pH on gene expression or biochemical CCM function, independent of a change in external CO 2 .All treatments were compared to samples from a low light (LL), low CO 2 control treatment (Ctrl., 15-20 μmol photons m À2 s À1 , [CO 2aq ] = 14 μM, pH = 6.8, n = 4) (see Materials and Methods).Note that no differences in growth rate were observed among the four treatments or the control over the 6 h incubations used in this study of gene expression.In total, 1,301,208,566 high-quality paired-end reads were retained across all samples after adapter and quality trimming (average of 65 million reads per sample).Of these, 1,060,767,326 (81.52%) were mapped to the KR01 reference genome (Lhee et al., 2021).The "unique exon count" value returned for each gene by the CLC Genomics Workbench read mapping analysis tool was by DESeq2 Version 1.34.0 (Love et al., 2014) to identify differentially expressed genes (DEGs) (Supplemental Tables 1 and 3).
Sample expression profiles were compared using principal component analysis (PCA), with pairwise correlation values constructed using DESeq2 normalized counts.This analysis allowed us to visualize the similarity between each of the samples and to identify any outliers.In general, samples from the same treatment were grouped together in the PCA and distinct from samples from other treatments (Supplemental Figure 1).The BC and HLBC treated samples clustered together when compared to other treatments.This may represent a genuine biological signal (i.e., high similarity in gene expression profiles of samples from the treatments) (Supplemental Figure 1A).Based on their position in the PCA plot, one outlier sample from each treatment group (samples: con-trol1, pH 1, HL3, and BC4) was detected; this effect is F I G U R E 1 Working model of the putative CO 2 delivery system to the chromatophore localized α-carboxysome in photosynthetic Paulinella (KR01).The major processes predicted to occur in the cytoplasm and mitochondria (red box with dashed outline) and the chromatophore (green box) are shown, as are the numbers for each relevant KR01 gene.Dashed arrows represent the predicted direction of transport of metabolites between the cytoplasm and the chromatophore.The α-carboxysome (based on Roberts et al., 2012;Ni et al., 2022) contains multiple RuBisCOs (RuBP) and carbonic anhydrases (CA, for CsoSCA).Created with BioRender.com.also obvious from the results of the pairwise correlation analysis shown in Supplemental Figure 1B.After close inspection of the treatment protocol, sample collection, and RNA extraction protocol and results, it remains unclear which factors led to these samples having expression profiles that are distinct from all other samples exposed to the same treatment.To eliminate biases or excess noise in the analysis, these samples were classified as outliers and removed from the analysis.This resulted in a total of 851,155,573 mapped reads that were subsequently used in the downstream analyses.

Differential gene expression analysis
DEGs from all treatment groups were identified using the DESeq2 package in RStudio 2021.9.2.382 (Team, 2022).In total, 23,995 genes were expressed in the HL plus control samples, of which 10,432 were significantly (adjusted p-value <0.05) differentially expressed between the two treatments; 5120 of these genes were upregulated and 5312 were downregulated in HL.There were 24,621 genes expressed in the BC plus Control samples, of which 4371 were significantly differentially expressed; 1830 were upregulated and 2541 were downregulated in BC.There were 24,899 genes expressed in the HLBC plus control treatment samples, of which only 3899 were significantly differentially expressed; 1393 were upregulated and 2506 were downregulated in HLBC.Finally, there were 24,584 genes expressed in the low CO 2 /pH 8 plus Control treatment samples, of which 8609 were significantly differentially expressed; 4212 were upregulated and 4397 were downregulated in the low CO 2 /pH 8 sample.A total of 1257 significant DEGs were found to be associated with all treatments.A similar number of unique DEGs (i.e., not differentially expressed in other treatments) were identified in the BC and HLBC treatments (374 and 324, respectively), whereas the HL treatment had the largest number of unique DEGs (3955) (Supplemental Figure 2; the overlap in DEGs between treatments, is shown in this figure).

Functional annotation of DEGs
Genes differentially expressed in each treatment were functionally annotated using BLASTp and KAAS (Supplemental Tables 4-7).To understand more about the functions of the DEGs in each treatment, gene ontology (GO) terms were assigned to the predicted proteins using PANNZER2 (Toronen et al., 2018).Significant DEGs (adjusted p-value <0.05) from each treatment were analysed for enrichment of GO terms using the topGO R package (Alexa & Rahnenführer, 2010), using all genes predicted in KR01 as the background; Fisher's exact test statistic and the 'elimination' algorithm were applied to correct for the hierarchical structure of GO terms.The top 30 GO terms from each category (biological processes, cellular component, and molecular function) for each treatment, ordered by Fisher's exact test p-values, are shown in Supplemental Figure 3.Some of the top, significantly enriched biological processes include "translation", "proton transmembrane transport", "protein folding", "glycolytic process", "acyl-CoA biosynthetic process", "cellular protein-containing complex", "glycine metabolic process", "DNA replication", "tricarboxylic acid cycle", "DNA duplex unwinding", "sulfur compound biosynthetic process", "heterochromatin assembly", "cellular response to unfolded protein", and "carbohydrate derivative biosynthesis" (Supplemental Figure 3).The full GO term enrichment results for each treatment are presented in Supplemental Tables 8-11.The majority of KR01 proteins associated with expressed genes have unknown functions.Of the top 20 DEGs with the highest absolute fold-change in each treatment, only one gene in the HL, one gene in the BC, five genes in the HLBC, and one gene in the pH treatment had KO numbers assigned, or top hits to proteins in the NCBI NR database that had unambiguous descriptions (i.e., not "unknown" or "hypothetical" function; Supplemental Tables 4-7).That is, the majority of the top 20 DEGs had unknown functions.Of the top 20 DEGs in each treatment, 10 were shared between HL and pH,two between BC and HLBC,and two between HLBC and pH (14 total,all downregulated).

Differential expression of putative biochemical CCM-related genes
Differential expression analysis was done to determine if and how the putative CCM-related genes in KR01 respond to environmental change.The genes were categorized into different modules based on their functions (Table 1).PPDK, the main enzyme involved in PEP regeneration was significantly upregulated under HL.However, PPDK was also upregulated in the BC and HLBC treatments (Table 1, Figure 3), but was downregulated in response to elevated pH.The upregulation of PPDK at elevated CO 2 is inconsistent with a role for this enzyme in a CO 2 delivery system in Paulinella.In this case, an alternative source of PEP (see PEPCK, below) would be needed to provide a carboxylation substrate for PEPC.
The carboxylation of PEP is catalysed by PEPC using HCO 3 À .The two putatively mitochondriontargeted PEPC proteins were significantly upregulated under HL and significantly downregulated under the BC and HLBC treatments (Table 1, Figure 3), consistent with their role in a CO 2 delivery system.They were also significantly downregulated in the low CO 2 /pH 8.0 treatment.As described above, the rapid conversion of CO 2 to the HCO 3 À used in the carboxylation reaction is facilitated by CA.Eleven CA genes encoding proteins that lack a crTP (and are therefore unlikely to be chromatophore targeted) were identified in the genome of KR01.Six of the genes are α-CA, four are β-CA, and one is a γ-CA.The three containing a putative mitochondrial targeting peptide are two β-CAs (g83283.t1,g51312.t1)and one γ-CA (g60394.t1).Six of these CA genes were significantly upregulated in response to HL with the αcarbonic anhydrase gene, g61179.t1,showing the highest log 2 fold change (3.76).Five of these six CA genes were significantly downregulated under the BC, HLBC, and low CO 2 /pH 8.0 treatments (Table 1, Figure 3).The latter result is not consistent with a CCM.Therefore, specific roles and localization of these CAs merit further research.
In KR01, PEPCK and NADP-ME which perform decarboxylation reactions are each encoded by two genes; one PEPCK and one NADP-ME protein is localized to the chromatophore whereas the other PEPCK and NADP-ME are likely cytosolic (Table 1, Figure 2B).Under HL, both copies of PEPCK were significantly upregulated.However, under BC, the cytosolic copy of PEPCK was upregulated, whereas the crTP-containing PEPCK copy was downregulated (neither was significantly differentially regulated).A similar pattern was observed for NADP-ME with the cytosol localized copy upregulated under both conditions (HL and BC; although not significantly upregulated), whereas the chromatophore targeted copy was significantly upregulated under HL and non-significantly downregulated under BC, as expected if the chromatophore targeted NADP-ME played a role in a biochemical CCM (Table 1, Figure 3).One or both chromatophoretargeted copies of these decarboxylases could produce CO 2 near RuBisCO, with PEPCK also regenerating PEP.
The delivery of CO 2 to the chromatophore would require carrier molecules such as OAA, malate, or aspartate to transfer carbon from the cytosol to the site of RuBisCO.The short-lived OAA molecules produced by PEPC are quickly converted into malate or aspartate by MDH and AST (both of which are encoded on the KR01 genome), before being transported to the site of decarboxylation.There are three genes encoding AST, one of which contains a crTP, a second is putatively mitochondrion targeted, and the third is cytosolic (Figure 1).Two of the AST genes, including the one encoding a crTP (Table 1), were significantly upregulated under HL and showed non-significant upregulation under BC and HLBC treatments.The third, putative mitochondrion-targeted protein (g81617.t1) was significantly upregulated under HL but downregulated under BC (significantly), HLBC (non-significantly), and low CO 2 /pH 8 treatments (nonsignificantly).Four copies of the MDH enzyme were identified in KR01.One is chromatophore targeted (g8927.t1),one is cytosolic (g55981.t1),and two are mitochondrion targeted (g62328.t1,g43748.t1).Genes encoding all but one mitochondrion-targeted protein (g43748.t1)were significantly upregulated under the HL treatment, with the cytosolic copy showing log 2 fold up-regulation of 1.4.However, the expression of these three up-regulated genes under BC and HLBC treatments varied, although they generally showed weak (non-significant) up or downregulation (Table 1, Figure 3).Two genes encoding alanine aminotransferase (ALT) exist in the KR01 genome with the encoded proteins putatively mitochondrion targeted (Table 1).One gene (g57073.t1)appears to be non-significantly downregulated under all conditions, whereas the other (g33965.t1) is significantly upregulated under HL and significantly downregulated under all other treatments.Based on these results, it appears that one or more AST, MDH, and ALT genes have transcription patterns in Paulinella consistent with a role in delivering CO 2 to the chromatophore.

Photorespiratory genes
The C 2 photorespiratory cycle converts the two-carbon product of RuBisCO oxygenation, 2-P-glycolate (2-PG), into glycine and ultimately serine and CO 2 .The decarboxylation of glycine is catalysed by the multi-enzyme glycine decarboxylase complex (GDC).We identified genes for critical proteins of the GDC, including two aminomethyl-transferring glycine dehydrogenases (g40133.t1,g40134.t1)and a glycine cleavage system protein H (g44532.t1) in the KR01 transcriptome data.The expression of both glycine dehydrogenases was not significantly affected by a shift to HL but was significantly downregulated in the high CO 2 treatments (BC and HLBC) and in the low CO 2 /pH 8 treatment.The glycine cleavage system protein H was downregulated under all treatments relative to the low light, low CO 2 and low pH control.These expression patterns are not consistent with a C 2 photorespiratory cycle that increases with oxygen production at high irradiance, but rather, suggest that glycine decarboxylation is upregulated by low external pH.In contrast, alanine-glyoxylate aminotransferase, the enzyme that catalyses the production of glycine from glyoxylate just upstream of the GDC in the C 2 photorespiratory pathway, was upregulated under high light and downregulated under high light plus high CO 2 (HLBC) and low CO 2 /pH 8.This suggests that under high light, a bottleneck may occur at the glycine decarboxylation step leading to a buildup of glycine and inefficient regeneration of 3-PGA for the Calvin-Benson cycle.The KR01 gene expression data lacked all enzymes of the alternative glycerate and decarboxylation photorespiratory pathways (Eisenhut et al., 2008).A gene for a mitochondrial glycine transporter-like protein (g6365.t1)was significantly upregulated under HL but was not significantly affected by HL plus high CO 2 or low CO 2 /pH 8.However, the expression of an alanine glycine permease (g2033.t1)was not significantly regulated by high light or high light plus high CO 2 but was strongly downregulated by low CO 2 /pH 8.Note that both these glycine transporters are not present in the BC transcriptomes.Overall, the expression of C 2 cycle genes indicates that Paulinella lacks or has a C 2 cycle that is uncoordinated with the production of 2-PG produced by RuBisCO oxygenase activity.

Phylogeny of CCM-related proteins in KR01
Phylogenetic analysis of the CCM-related genes was performed to determine whether they are of eukaryotic (host) provenance or originated via HGT or EGT.All these predicted proteins in KR01 are of eukaryotic origin except for the crTP-containing copy of MDH (g8927.t1;Supplemental Dataset 1).This MDH gene is positioned in a wellsupported clade (bootstrap support 100%) with sequences from predominantly alpha-proteobacteria.It should be noted that the MDH and PCK combined geneset trees, as well as the g8927.t1,g55981.t1,and g8928.t1 gene trees did not converge after 2000 iterations.Because the placement of the KR01 genes was congruent across these trees, their topologies were considered robust (for all trees, see Supplemental Datasets 1 and 2).

DISCUSSION
There has been much interest in developing Paulinella as a model system for the analysis of organellogenesis because it is the only known intermediate stage of plastid primary endosymbiosis and has many features associated with "evolution in action" (Calatrava et al., 2022;Gabr et al., 2020;Stephens et al., 2021;Van Etten et al., 2023).Currently, knowledge of its photosynthetic and CO 2 fixation pathways is limited, and we still do not fully understand the mechanisms associated with the adaptation of this species to its photosynthetic organelle.These issues are exacerbated by the extremely slow growth rate of Paulinella cells in culture that requires months of effort to generate sufficient biomass for omics or other analyses.In this study, we find that the KR01 chromatophore genome does not encode a bicarbonate transporter, as would be expected for this cyanobacterial-derived organelle.In contrast, analysis of the chromatophore genome from P. chromatophora (Nowack et al., 2008; NCBI accession NC_011087) and P. longichromatophora (Lhee et al., 2019; NCBI accession MG264610) reveal the presence of an inorganic carbon transporter gene (ictB).This gene is absent from the KR01 chromatophore genome (Lhee et al., 2017; NCBI accession KX897545), although the flanking genes in this region are conserved in P. chromatophora and P. longichromatophora (upstream: phosphotransferase superclass; downstream, tRNA (guanine-N(7)-)-methyltransferase), indicating a deletion event in KR01.Homologues of ictB could not be identified in the nuclear gene inventory of KR01, suggesting the gene has been lost from this lineage.Phylogenetic analysis of IctB demonstrates that the highly diverged chromatophore encoded proteins (potentially indicating weakened selection) are of α-cyanobacterial origin in P. chromatophora and P. longichromatophora (Supplemental Figure 4).Knockdown of ictB demonstrated it as being necessary for HCO 3 À transport in Synechococcus sp.PCC 7942 (Bonfil et al., 1998).Given this result, we propose that the CCM is, as for many other features associated with photosynthesis (see above), in a transitional phase in Paulinella species, with bicarbonate transport in P. chromatophora and P. longichromatophora likely able to support the native α-carboxysome in these taxa.It is possible that the biochemical CCM we propose to exist in KR01 may be the outcome of the loss of ictB (see below), although this is currently speculation.Perhaps most intriguing, different CCMs may be evolving in KR01 and other Paulinella species.Given this knowledge, we report that the KR01 nuclear genome encodes genes that may improve the delivery of CO 2 to the chromatophore via a putative biochemical CCM to enhance the function of the αcarboxysome in this species.The upregulated expression of target genes under HL, and downregulation when experiencing supplementation with bicarbonate (either in LL or HL) are consistent with this hypothesis.

DEG analysis in KR01
Analysis of the expression patterns of KR01 genes involved in the putative biochemical CCM shows that under HL when we would expect higher RuBisCO activity and a higher demand for CO 2 at the site of carbon fixation, many of these genes are significantly upregulated (adjusted p-value <0.05;Table 1, Figure 3).With few exceptions, the opposite is true under BC; the genes are (often significantly) downregulated when bicarbonate is added to the culture medium which increases CO 2 availability and may reduce the need for CCM activity (Table 1).Moreover, genes upregulated under HL are suppressed (i.e., downregulated or not significantly upregulated) under HL with the addition of bicarbonate (the HLBC treatment).This suggests that many of the CO 2 delivery genes in KR01 are responding to the external CO 2 concentration and not directly to light intensity (at the higher light intensity more CO 2 would be fixed, reducing the intracellular CO 2 concentration).Some of these genes also respond to the pH treatment in which the bicarbonate concentration was elevated with respect to the control, suggesting that, as in plants, HCO 3 À might be used as a pH indicator in photosynthetic Paulinella and some of the genes may be responding to changes in pH as well as CO 2 .We also found that, whereas the HL and pH treatments shared the greatest number of significant DEGs, they also had the most unique, significant DEGs (3955 and 2419, respectively).This suggests that HL and pH affect multiple pathways, as expected, and elicit a broad transcriptional response by the cell.In contrast, the BC treatment resulted in only 374 unique DEGs (Supplemental Figure 2).The top 20 genes with the largest absolute fold change in each treatment have unknown functions (Supplemental Tables 4-7).This suggests that in KR01 (and potentially in Paulinella more generally), 'dark' genes of unknown function (e.g., regulatory roles; Stephens et al., 2018) may be of high importance in the stress response.
In contrast to genes associated with the putative biochemical CCM, genes encoding critical components of a putative, ancestral KR01 biophysical CCM did not follow the expected transcription pattern of upregulation under HL and down-regulation in high CO 2 .For example, expression of the single nuclear gene that encodes an α-carboxysome component, csoS4A, did not change markedly under either HL (log 2 fold change = 0.155; adjusted p-value = 0.872) or high CO 2 (log 2 fold change = À0.269;adjusted pvalue = 0.842) when compared to the low light/low CO 2 control.The other treatments (HLBC, pH) also did not result in significant expression differences for this gene.In addition to carboxysome proteins, biophysical CCMs in cyanobacteria depend on bicarbonate transport (Price et al., 2008).Three putative eukaryotic bicarbonate transporters from the solute carrier (SLC) 26 family and none from the SLC4 family were identified in KR01.One SLC26 gene (g61426.t1)was upregulated in both HL and BC treatments, inconsistent with the expected pattern of a CCM-critical gene, whereas the other two were downregulated by HL and did not significantly change in the other treatments (Supplemental Tables 4-7).The putative biochemical CCM in KR01 may therefore support or compensate for an ineffective or poorly regulated biophysical CCM, perhaps due to the loss of the putative cyanobacterial bicarbonate transporter ictB.In that case, the biochemical CCM may deliver inorganic carbon to the chromatophore where a combination of elevated pH and carbonic anhydrase trap it as HCO 3 À , which is subsequently converted to CO 2 and then fixed by carboxysome-encapsulated carbonic anhydrase and RuBisCO, respectively.Such a scenario could be supported by one or more of the nuclear-encoded CAs that were significantly upregulated in response to HL and downregulated under elevated CO 2 , but the localization of these nuclear-encoded CAs and how they may be translocated to the chromatophore is currently unclear.Given that photosynthesis-related genes on the chromatophore genome (e.g., psbA, psaA) appear to lack light regulation (Zhang et al., 2017), the csoS4A result suggests that nuclear control of α-carboxysome activity may also not be light-regulated.This hypothesis extends to the shell carbonic anhydrase, csoSCA, that is key to biophysical CCM function.Nonetheless, the relocation of a key structural component of the αcarboxysome to the nuclear genome may allow the amoeba to control the catalytic activity of this microcompartment (Cai et al., 2009).Such control is consistent with the analysis of nucleotide biosynthesis and multi-protein DNA replication complexes in the chromatophore that all rely on nuclear-encoded proteins (containing crTPs) to complete critical steps in these pathways (Gabr, Stephens, & Bhattacharya, 2022).

The metabolite transporter conundrum
Despite the clear need for intermediate metabolites in the KR01 CO 2 delivery system to cross chromatophore membranes, there is until now, no evidence of such transporters in Paulinella.Only the BASS2/NHD/PPT system (pyruvate and phosphoenolpyruvate transport) and ACC (ADP/ATP) transporters have been identified in KR01, and most of the genes encoding these proteins show significant upregulation under HL (Table 2).Their response to elevated CO 2 is much more variable with only some showing downregulation under BC and HLBC relative to HL treatments, as expected if they supported a CCM.However, most of these genes were non-significantly up or downregulated.Based on the conditions examined here, light regulation of transporters appears to be stronger than CO 2 regulation.Previous work has documented the substantial loss of chromatophore functions due to genome reduction, whereby all existing transporters were lost (Nowack et al., 2008).Given the absence of DiT1/DiT2 transporters in the nuclear genome, Paulinella may be using novel/unknown transporters, or some other mechanism to transport malate, OAA, and aspartate across the chromatophore membrane.The capacity of Paulinella to transport metabolites into and out of the chromatophore remains to be better understood with the current data suggesting that proteins related to octotricopeptide repeats modulate membrane permeability (Oberleitner et al., 2020), potentially in a nonspecific manner.Therefore, despite the interesting patterns of differential gene expression described here, the transport of intermediates (or other metabolites) in photosynthetic Paulinella species needs to be studied in greater detail before the data can be interpreted with confidence and their potential roles in the putative CCM validated.

An alternative model for the putative Paulinella CCM
An open question is: why does the slow-growing Paulinella need to increase CO 2 delivery to the chromatophore via a putative biochemical CCM?In this regard, Marin et al. (2007) suggested that the intracellular environment in P. chromatophora may be CO 2 -rich due to host respiration, and therefore an efficient CCM may not be needed in this species.However, this was conjecture, and they noted the need for experimental data to address this issue.Perhaps more interesting is the observation that these amoebae cannot tolerate even moderate light levels, which suggests that CO 2 limitation may not be constraining photosynthesis.Such light sensitivity is, however, expected in a novel photosynthetic lineage that must manage toxic waste, such as reactive oxygen produced by the photosynthetic electron transport system.For photosynthetic Paulinella, slow growth does not appear to be explained by a poorly functioning photosynthesis machinery (Gabr, Zournas, et al., 2022).Rather, this phenotype is most likely explained by how the host amoeba partitions photosynthesis-derived fixed carbon and reductant and manages photosynthesis-derived waste products (e.g., reactive oxygen species) generated by the efficient cyanobacterial endosymbiont electron transport system (Stephens et al., 2021).We refer to this scenario as the "chassis and engine" model (Stephens et al., 2021) that proposes that the Paulinella host (not the endosymbiont) is under strong selective pressure to evolve novel traits, relative to the heterotrophic ancestor, that may facilitate the continued integration of the chromatophore into host metabolism.It is therefore possible that the different photosynthetic Paulinella hosts evolved (or more likely, are evolving) novel CO 2 delivery systems to reduce the accumulation of toxins that arise from both the oxygenation activity of RuBisCO, which is likely compounded by a seemingly nascent and inefficient C 2 photorespiratory cycle, and the suboptimal export of photosynthetic products from their chromatophores.In both cases, natural selection appears to favour the low light-adapted, slow-growth phenotype that ensures the survival of Paulinella in  aquatic habitats dominated by phototrophs that contain the highly efficient Archaeplastida-derived plastid (e.g., diatoms, dinoflagellates).An "inefficient" CO 2 delivery system may have evolved to allow Paulinella to balance the low levels of O 2 production in low-light habitats with the acquisition of CO 2 needed for carbon fixation, thereby minimizing energetic losses due to photorespiration.This novel trait, which has persisted for millions of years, may also allow Paulinella to survive during times of emersion or partial desiccation, given that a putative novel strain/species was recently described that exists as an epiphyte on the roots of aquatic grasses (Van Etten et al., 2023).
The multiple independent origins of different types of CCMs in microalgae suggest that selection to increase CO 2 concentration in the vicinity of RuBisCO is a powerful driving force in the evolution of photosymbionts (Kupriyanova et al., 2023;Wei et al., 2019).The complex and incomplete (e.g., metabolite transporters) results we report here for KR01 likely indicate that the putative biochemical CCM in this lineage is still evolving and does not have the hallmarks of canonical CO 2 delivery pathways in algae and plants, whose ancestor originated >1.6 Bya.Given this uncertainty, the path forward with Paulinella to test our hypotheses is to use approaches such as proteomics to validate the RNAseq results, immunological approaches to determine the subcellular locations of the putative CCM components, metabolomic profiling, and 13 CO 2 tracing to determine if changes in the expression of putative CCM-related genes correspond with a predictable shift in metabolite pools.Physiological studies with low CO 2acclimated cultures are needed to examine whether carbon fixation is indeed supported by the production and intracellular transport of metabolites arising from the novel CCM and to quantify predicted C 2 cycle bottlenecks that may limit the efficiency of photorespiration.Experimental validation of the subcellular localization of these proteins is also needed to determine the extent of compartmentalization of these functions in Paulinella.
List of central core genes (bold with green highlight) and other closely related genes and their expression under different treatments.The log 2 fold change values that represent significantly (adjusted p-value <0.05) upregulated (orange) or down-regulated (blue) genes are highlighted for each treatment.The predicted localization (Loc) of the proteins is shown (C, cytosolic; CR, chromatophore; M, mitochondrion).CO 2 DELIVERY SYSTEM TO THE PAULINELLA CHROMATOPHORE 7 of 18

F
I G U R E 2 Alignments of PEPC (A) and PPDK (B) proteins from KR01 (in the yellow field) with those in other organisms.(A) The C-terminal end of the PEPC protein alignment is shown with the conserved PEP binding sites (pink), bicarbonate binding site (blue), C-terminal tetrapeptide (green), and the amino acid residues corresponding to position 774 in plant PEPC proteins (red) highlighted.(B) A section of the PPDK protein alignment is shown with the identified sequence signature for the real PPDK enzyme highlighted in blue.Sequences were aligned using Clustal Omega 1.2.3 in Geneious Prime 2022.1.1 and the colour annotations were created with BioRender.com.Numbering indicates the position of residues within the alignment.

F
I G U R E 3 Working model of the putative CO 2 delivery system to the chromatophore localized α-carboxysome in Paulinella (based on Figure 1), showing the results of differential gene expression analysis.Expression pattern of CCM-related genes in (A) high light (HL; induced condition), (B) added bicarbonate (BC; inhibition condition) under low light, (C) double treatment (HLBC), and (D) under low light, pH shifted from 6.8 to 8 (pH) treatments.Solid black arrows indicate upregulation of expression in response to a treatment.Solid grey arrows indicate the downregulation of expression in response to treatment.An asterisk indicates significance (adjusted p-value <0.05) up-or downregulated in response to treatment.Enzymes encoded by multiple gene copies are represented by coloured squares, with copies localized to the cytoplasm represented by blue squares and copies localized to the mitochondria by orange squares.

T A B L E 2
Differential accumulation of transcripts encoding metabolite transporters in KR01.
Putative biochemical CCM-related gene modules in KR01.
T A B L E 1 Significantly (p-adjusted <0.05) upregulated (orange) and downregulated (blue) genes are highlighted. Note: